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Abstract 

We study the following synchronous process that we call repeated balls-into-bins. The 
process is started by assigning n balls to n bins in an arbitrary way. In every subsequent 
round, from each non-empty bin one ball is chosen according to some fixed strategy (random, 
FIFO, etc), and re-assigned to one of the n bins uniformly at random. 

We define a configuration legitimate if its maximum load is O(logn). We prove that, 
starting from any configuration, the process will converge to a legitimate configuration in 
linear time and then it will only take on legitimate configurations over a period of length 
bounded by any polynomial in n, with high probability (w.h.p.). This implies that the process 
is self-stabilizing and that every ball traverses all bins in 0(nlog 2 n) rounds, w.h.p. 


Keywords: Balls into Bins, Self-Stabilizing Systems, Markov Chains, Parallel Resource As¬ 
signment. 
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1 Introduction 


We study the following repeated balls-into-bins process. Given any n ^ 2, we initially assign n 
balls to n bins in an arbitrary way. Then, at every round, from each non-empty bin one ball 
is chosen according to some strategy (random, FIFO, etc) and re-assigned to one of the n bins 
uniformly at random. Every ball thus performs a sort of delayed random walk over the bins 
and the delays of such random walks depend on the size of the bin queues encountered during 
their paths. It thus follows that these random walks are correlated. We study the impact of 
such correlation on the maximum load. This process can also be seen as a random-walk based 
protocol for parallel resource (or task) assignment in distributed systems [501135] . 

Inspired by previous notions of (load) stability [2lfl0] . we study the maximum load i.e., 
the maximum number of balls inside one bin at round t and we are interested in the largest t 
achieved by the process over a period of any polynomial length. We say that a configuration is 
legitimate if its maximum load is O(logn) and a process is stable if, starting from any legitimate 
configuration, it only takes on legitimate configurations over a period of poly(n) length, w.h.p. 
We also investigate a probabilistic version of self-stabilization nans]: we say that a process 
is self-stabilizing if it is stable and if, moreover, starting from any configuration, it converges 
to a legitimate configuration, w.h.p. The convergence time of a self-stabilizing process is the 
maximum number of rounds required to reach a legitimate configuration starting from any 
configuration. This natural notion of (probabilistic) self-stabilization has also been inspired by 
that in [23] for other distributed processes. 

Stability has consequences for other important aspects of this process. For instance, if the 
process is stable, we can get good upper bounds on the progress of a ball, namely the number 
of rounds the ball is selected from its current bin queue, along a sequence of t ^ 1 rounds. 
Furthermore, we can eventually bound the parallel cover time, i.e., the time required for every 
ball to visit all bins. Self-stabilization has also important consequences when the system is 
prone to transient faults mmm- 

To the best of our knowledge, the repeated balls-into-bins process was first studied in |9], 
where it is used as a crucial sub-procedure to optimize the message complexity of a gossip 
algorithm in the complete graph, and then in BED]. The analysis in [21 [22 (only) hold for 
very-short (i.e. logarithmic) periods, while the analysis in |?j considers periods of arbitrary 
length but it (only) allows to achieve a bound on the maximum load that rapidly increases 
with time: after t rounds, the maximum load is bounded by 0(\/t'j w.h.p. By adopting the 
FIFO strategy at every bin queue, the latter result easily implies that the progress of any ball 
is Q(\/t) w.h.p. On the other hand, an upper bound 0(n 2 logn) for the parallel cover time of 
the repeated balls-into-bins process easily follows from the fact that the cover time of one single 
random walk on the complete graph is 0(nlogn) w.h.p. 

Previous results are thus not helpful to establish whether this process is stable (or, even 
more, self-stabilizing) or not. Moreover, the previous analyses of the maximum load in [71121120] 
are far from tight, since they rely on some rough approximations of the studied process via 
other, much simpler Markov chains: for instance, in [7], the authors consider the process - 
which clearly dominates the original one - where, at every round, a new ball is inserted in every 
empty bin. That analysis thus does not exploit the global invariant (a fixed number n of balls) 
of the original process. 

Our Results. We provide a new, tight analysis of the repeated balls-into-bins process that 
significantly departs from previous ones and show that the system is self-stabilizing. We prove 
that, for any arbitrarily-large constant c, if the process starts from a legitimate configuration, 
then the maximum load M® is O(logn) for all t = 0(n c ), w.h.p. Moreover, starting from any 
configuration, the system reaches a legitimate configuration within 0(n) rounds, w.h.p. 
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Our result strongly improves over the best previous bounds Biatzni and it is almost tight, 
since the classical lower bound 0(logn/loglogn) on the maximum load (see, e.g., [33]) clearly 
applies also in our repeated setting. Our result further implies that, under the FIFO queueing 
policy, any ball performs 0(t/logn) steps of its individual random walk over any sequence of 
t = poly(n) rounds w.h.p., so the parallel cover time is 0(nlog 2 n) w.h.p. This is only a logn 
factor away from the lower bound following from the single-ball process. 

Besides being interesting in their own right, balls-into-bins processes are used to model and 
analyze several important randomized protocols in parallel and distributed computing [SMBS]- 
In particular, the process we study models a natural randomized solution to the problem of 
(parallel) resource (or task) assignment in distributed systems (this problem is also known as 
traversal ) |3L)ll35j . In the basic case, the goal is to assign one resource in mutual exclusion to 
all processors (i.e. nodes) of a distributed system. This is typically described as a traversal 
process performed by a token (representing the resource or task) over the network. The process 
terminates when the token has visited all nodes of the system. Randomized protocols for this 
problem m are efficient approaches when, for instance, the network is prone to faults/changes 
and/or when there is no global labeling of the nodes. 

A simple randomized protocol is the one based on random walks [1411241125] : starting from 
any node, the token performs a random walk over the network until all nodes are visited, w.h.p. 
The first round in which all nodes have been visited by the token is called the cover time of 
the random walk [141129] , The expected cover time for general graphs is 0(|R| • |-E|) (see, for 
example, (Ml)- 

In distributed systems, we often are in the presence of several resources or tasks that must 
be processed by every node in parallel. This naturally leads to consider the parallel version 
of the basic problem in which n different tokens (resources) are initially distributed over the 
set of nodes and every token must visit all nodes of the network. Similarly to the basic case, 
an efficient randomized solution is the one based on (parallel) random walks. In order to visit 
the nodes, every token performs a random walk under the constraint that every node can 
process and release at most one token per round. Again, maximum load is a critical complexity 
measure: for instance, it can determine the required buffer size at every node, bounds on the 
token progress and, thus, on the parallel cover time. 

It is easy to see that, when the graph is complete, the above protocol - based on parallel 
random walks - is in fact equivalent to the repeated balls-into-bins process analyzed in this 
paper. For this case, our results imply that, every token visits all nodes of the system with at 
most a logarithmic delay w.r.t. the case of a single token: so, we can derive an upper bound 
0(nlog 2 n) for the parallel cover time, starting from any initial configuration. 

We can also consider the adversarial model in which, in some faulty rounds, an adversary 
can re-assign the tokens to the nodes in an arbitrary way. The self-stabilization and the linear 
convergence time shown in Theorem |T] imply that the 0(nlog 2 n) bound on the cover time still 
holds, provided that faulty rounds occur with a frequency no higher than cn, for a sufficiently 
large constant c. 

Related Work. 

- Random Walks on Graphs. The repeated balls-into-bins process was first considered in [MM] , 
since it describes the process of performing parallel random walks in the (uniform) gossip model 
(also known as random phone-call model [15H26] ) when every message can contain at most one 
token. Maximum load (i.e., node congestion), token delays, mixing and cover times are here the 
most crucial aspects. We remark that the flavor of these studies is different from ours: indeed, 
their main goal is to keep maximum load and token delays logarithmic over some poly logarithmic 
period. Their aim is to achieve a fast mixing time for every random walk in the case of good 
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expander graphs. In particular, in [9], a logarithmic bound is shown for the complete graph 
when m = 0{n/\ogn) random walks are performed over a logarithmic time interval. A similar 
bound is also given for some families of almost-regular random graphs in |2(J| . Finally, a new 
analysis is given in [7] for regular graphs yielding the bound 0(y/T). 

- Parallel Computing. Balls-into-bins processes have been extensively studied in the area of 
parallel and distributed computing, mainly to address balanced-allocation problems mm, 
PRAM simulation m and hashing m- In order to optimize the total number of random bin 
choices used for the allocation, further allocation strategies have been proposed and analyzed 
(see, e.g., HEMMED- As previously mentioned, our notion of stability is inspired by those 
studied in mmm where load balancing algorithms are analyzed in scenarios in which new 
tasks arrive during the run of the system, and existing jobs are executed by the processors and 
leave the system. An adversarial model for a sequential balls-into-bins process has been studied 
in [3], We remark that, in the above previous works, the goal is different from ours: each 
ball/task must be allocated to one, arbitrary bin/processor (it is not a token-traversal process). 

- Queuing Theory. To the best of our knowledge, the closest model to our setting in classical 
queuing theory is the closed Jackson network [3]. In this model, time is continuous and each node 
processes a single token among those in its queue; processing each token takes an exponentially 
distributed interval of time. As soon as its processing is completed, each token leaves the current 
node and enters the queue of a neighbor chosen uniformly at random. Notice that, since time 
is continuous, the process’ events are sequential, so that the associated Markov chain is much 
simpler than the one describing our parallel process. In particular, the stationary distribution 
of a closed Jackson network can be expressed as a product-form distribution. It is noted in 1231 
that “[...] virtually all of the models that have been successfully analyzed in classical queuing 
network theory are models having a so-called product form stationary distribution”. Because of 
the above considerations regarding the difficulty of our process (especially the non-reversibility 
of its Markov chain), the stationary distribution is instead very likely not to exhibit a product- 
form distribution, thus laying outside the domain where the techniques of classical queuing 
theory seem effective. We finally cite the seminal work m on adversarial queing systems: here, 
new tokens (having specified source and destination nodes) are inserted in the nodes according 
to some adversarial strategy and a notion of edge-congestion stability is investigated. 

2 Self-Stabilization of repeated balls into bins 

In order to study the maximum load of the repeated balls into bins process, the state of the 
system is completely characterized by the load of every bin. Formally, for each bin u e [n] let 
Qu } be the r.vQ indicating the number of balls, i.e. the load, in u at round t. We write for 
the vector of these random variables, i.e., : u e [n]j. We write q = (qi ,..., q n ) 

for a (load) configuration, i.e., q u e {0,1,... ,n} for every u e [n] and Xm=i Qu = n. We define 
the maximum load of a configuration q = (gi,..., q n ) as 

Af(q) = max{ q u : u e [n] }, 

and, for brevity’ sake, given any round t of the process, we define 

AfW = A/(QW) 

According to the above definition, we say that a configuration q is legitimate if M(q) ^ /3-logn, 
for some absolute constant (3 > 0. 

1 We always use capital letters for random variables, lower case for quantities, and bold for vectors. 
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In this section we prove the main theorem of this paper. 

Theorem 1 . Let c be an arbitrarily-large constant and let q be any legitimate configuration. 
Let the repeated balls-into-bins process start from Q^ 0 ' = q. Then, over any period of length 
0(n c ), the process visits only legitimate configurations, w.h.p., i.e., = O(logn) for all 

t = 0(n c ) w.h.p. Moreover, starting from any configuration, the system reaches a legitimate 
configuration within 0(n) rounds, w.h.p. 

Overview of the analysis 

In the repeated balls-into-bins process, every bin can release at most one ball per round. As a 
consequence, the random walks performed by the balls delay each other and are thus correlated 
in a way that can make bin queues larger than in the independent case. Indeed, intuitively 
speaking, a large load observed at a bin in some round makes “any” ball more likely to spend 
several future rounds in that bin, because if the ball ends up in that bin in one of the next few 
rounds, it will undergo a large delay. This is essentially the major technical issue to cope with. 

The previous approach in [TJ relies on the fact that, in every round, the expected balance 
between the number of incoming and outgoing balls is always non-positive for every non-empty 
bin (notice that the expected number of incoming balls is always at most one). This may suggest 
viewing the process as a sort of parallel birth-death process [22] • Using this approach and with 
some further arguments, one can (only) get the “standard-deviation” bound 0{\ft) in [7j. Our 
new analysis proving Theorem [T] proceeds along three main steps. 

i) We first show that, after the first round, the aforementioned expected balance is always 
negative, namely, not larger than —1/4. Indeed, the number of empty bins remains at least n/4 
with (very) high probability, which is extremely useful since a bin can only receive tokens from 
non-empty bins. This fact is shown to hold starting from any configuration and over any period 
of polynomial length. 

ii) In order to exploit the above negative balance to bound the load of the bins, we need some 

strong concentration bound on the number of balls entering a specific bin u along any period 

of polynomial size. However, it is easy to see that, for any fixed u, the random variables 

| zip i counting the number of balls entering bin u are not mutually independent, neither 
1 ) t^o 

are they negatively associated, so that we cannot apply standard tools to prove concentration 
(see Appendix [B] for a counterexample). To address this issue, we define a simpler repeated 
balls-into-bins process as follows. 


Tetris process. Starting from any configuration with at least n/4 empty bins, in each 
round 

- from every non-empty bin we pick one ball and we throw it away, and 

- we pick exactly (3/4)n new balls and we put each of them independently and u.a.r. in 
one of the n bins. 


Using a coupling argument and our previous upper bound on the number of empty bins, we 

prove that the maximum number of balls accumulating in a bin in the original process is not 

larger than the maximum number of balls accumulating in a bin in the Tetris process, w.h.p. 

in) The Tetris process is simpler than the original one since, at every round, the number of 

balls assigned to the bins does not depend on the system’s state in the previous round. Hence, 

random variables j Zpp 1 counting the number of balls arriving at bin u in the Tetris process 
1 ) t^o 

are mutually independent. We can thus apply standard concentration bounds. On the other 
hand, differently from the approximating process considered in [7], the negative balance of 
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incoming and outgoing balls proved in Step i) still holds, thus yielding a much smaller bound 
on the maximum load than that in [7]. A probabilistic version of the Tetris process, where 
the number of new balls arriving at each round is a random variable with expectation An, for 
some A = A(n) e [0,1], has been recently studied in [II]. 

In the remainder of this section, we formally describe the above three steps, thus proving 
Theorem [I] 


2.1 On the number of empty bins 


We next show that the number of empty bins is at least a constant fraction of n over a very 
large time-window, w.h.p. This fact could be proved by standard concentration arguments if, 
at every round, all balls were thrown independently and uniformly at random. A little care is 
instead required in our process to properly handle, at any round, “congested” bins whose load 
exceeds 1. These bins will be surely non-empty at the next round too. So, the number of empty 
bins at a given round also depends on the number of congested bins in the previous round. 

Lemma 2. Let q = {q \,..., q n ) be a configuration in a given round and let X be the random 
variable indicating the number of empty bins in the next round. For any large enough n, it holds 
that 

where a is a suitable positive constant. 

Proof. Let a = a(q) and b = 6(q) respectively denote the number of empty bins and the number 
of bins with exactly one token in configuration q. For each bin u of the a + b bins with at most 
one token, let Y u be the random variable indicating whether or not bin u is empty in the next 
round, so that 



where in the last inequality we used the fact that 1 — x ^ e 1 ~ x . Hence we have that 

n—a 

E [X]^{a + b)e~— (1) 


The crucial fact is that the number of bins with two or more tokens cannot exceed the number 
of empty bins, i.e. n — (a + b) ^ a. Thus, we can bound the number of empty bins from belovH, 
a ^ (n — b)/ 2 , and by using that bound in (HI) we get 


E [A] ^ 


n + b n + b , 

_ p 2 (n— 1) 

2 


Now observe that, for large enough n a positive constant e exists such that 


n + b 
2 


n+b 


^ > (1 + e) 


n 


for every 0 ^ b ^ n. 

It is not difficult to prove that random variables Y\,, Y a+ i , are negatively associated (e.g., 
see Theorem 13 in iisi)- Thus we can apply (see Lemma 7 in PH) the Chernoff bound © with 
8 = e/{ 1 + e) to r.v. X to obtain 






exp 


4(1 + £) 


n 


□ 


2 Observe that this argument only works to get a lower bound on the number of empty bins and not for an 
upper bound. 
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From the above lemma it easily follows that, if we look at our process over a time-window 
T = T[n ) of polynomial size, after the first round we always see at least n/4 empty bins, w.h.p. 
More formally, for every t e {1,..., T}, let £t be the event “The number of empty bins at round 
t is at least n/4”. From Lemma [a] and the union bound we get the following lemma. 


Lemma 3. Let qo denote the initial configuration, let T = T[n ) = n c for an arbitrarily large 
constant c. For any large enough n it holds that 

P (p £ t I Q (0) = qo) ^ 1 — e -7n 

where 7 is a suitable positive constant. 


Proof. By using the union bound we have that 

P (n £ t | Q (0) = q 0 j = 1 - p (jj Tt I Q (0) = qo^ 1 - 2 P I Q (0) = qo) 

By conditioning on the configuration at round t — 1, from the Markov property and Lemma 0 
it then follows that 


p (st I Q (0) = qo) = I Q (t_1) = q) p (Q (t_1) = q I Q (0) = qo) < 


Hence, 


P ( p| £ t | Q (0) = q 0 ] > 1 - Te~ an > 1 - e~^ n 


\t =1 


for a suitable positive constant 7 . 


□ 


2.2 Coupling with Tetris 


Using a coupling argument and Lemma 0 we now prove that the maximum load in the original 
process is stochastically not larger than the maximum load in the Tetris process w.h.p. 

In what follows we denote by the set of non-empty bins at round t in the original 
process. Recall that, in the latter, at every round a ball is selected from every non-empty bin 
u and it is moved to a bin chosen u.a.r. Accordingly we define, for every round t, the random 
variables 


1 ) 


U E 


( 2 ) 


where Xp +1 ' > indicates the new position reached in round t + 1 by the ball selected in round t 
from bin u. Notice that for every non-empty bin u e we have that P^A„ i+1> = v\ = 1/n 

for every bin v 6 [n]. The random process {Q^ : t e N} is completely defined by random 
variables X*’s, indeed we can write 


Qi t+1) = Q® - l + 


ju e IU (t) : xi t+1) = u} 


and W {t+1) 



where we used notation a — b = max{a — 6 , 0}. Analogously, for each bin u e [n] in the Tetris 
process, let QiP be the random variable indicating the number of balls in bin u in round t. We 
next prove that, over any polynomially-large time window, the maximum load of any bin in 
our process is stochastically smaller than the maximum number of balls in a bin of the Tetris 
process w.h.p. More formally, we prove the following lemma. 
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Lemma 4. Assume we start our process and the Tetris process from the same initial confi¬ 
guration q = (qi,... ,q n ) such that Yju=i Qu = n and containing at least n /4 empty bins. Let 
T = T{n) be an arbitrary round and let Mt and My be respectively the random variables in¬ 
dicating the maximum loads in our original process and in the Tetris process, up to round T. 
Formally 


Mt = max{Q^ : u e [n], t = 1,2,..., T} 

Mt = max{ : u e [n], t = 1,2,..., T} 

For every k ^ 0 it holds that 

P (M t >k)^ p (&T > *;) + T ■ e~ in 
for a suitable positive constant 7 . 

Proof. We proceed by coupling the Tetris process with the original one round by round. 
Intuitively speaking the coupling proceeds as follows: 

- Case (i): the number of non-empty bins in the original process is k ^ | n. For each non-empty 
bin u, let i u be the ball picked from u. We throw one of the |n new balls of the Tetris 
process in the same bin in which i u ends up. Then, we throw all the remaining |n — k balls 
independently u.a.r. 

- Case (ii): the number of non-empty bins is k > | n. We run one round of the Tetris process 
independently from the original one. 

By construction, if the number of non-empty bins in the original process is not larger than | n at 
any round, case (ii) never applies and the Tetris process “dominates” the original one, meaning 
that every bin in the Tetris process contains at least as many balls as the corresponding bin 
in the original one. Since from Lemma E] we know that the number of non-empty bins in the 
original process is not larger than |n for any time-window of polynomial size w.h.p., we thus 
have that the Tetris process dominates the original process for the whole time window w.h.p. 

More formally, for t e {1,..., T}, denote by the set of new balls in the Tetris process 
at round t (recall that the size of B^) is (3/4)n for every t e {1,..., T}). For any round t and 
any ball i 6 B^\ let } be the random variable indicating the bin where the ball ends up. 
Finally, let j uf ' 1 : t = 1,..., T, i e B^ j be a family of i.i.d. random variables uniform over 

M- 

At any round te{l,...,T}: 

If |IT^ -1 )| ^ (3/4)n: Let ByJ be an arbitrary subset of B® with size exactly |W^ -1 )|, let 
/(*) : ByJ —> Vfd* -1 ) be an arbitrary bijection and set 

(,) _ ( x t ] if i £ 

I uV if i £ BW\B$ 

If > (3/4)n : Set xf ] = U.j t] for all i e B® . 

By construction we have that random variables 

: te{l,2,...,T},i6BW} 
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are mutually independent and uniformly distributed over [n]. Moreover, in the joint probability 
space for any k we have that 

P(Mt ^ k) = P (^Mt ^ k , Mt $' Mt^j +P^ Mt F k, Mt < Mj^j ^ P^ Mt ^ +P {^Mt < Mt*J 

Finally, let £t be the event “There are at least n/4 empty bins at all rounds t 6 {1,..., T}” 
and observe that, from the coupling we have defined, the event £t implies event “Mt ^ Mt"- 
Hence P ^ Mt < MtJ < P (£r) and the thesis follows from Lemma [3J □ 

2.3 Analysis of the Tetris process 

We begin by observing that in the Tetris process, the random variables indicating the number 
of balls ending up in a bin in different rounds are i.i.d. binomial. This fact is extremely useful 
to give upper bounds on the load of the bins, as we do in the next simple lemma, that will be 
used to prove self-stabilization of the original process. 

Lemma 5. From any initial configuration, in the Tetris process every bin will be empty at 
least once within 5 n rounds, w.h.p. 

Proof. Let u £ [n] be a bin with k ^ n balls in the initial configuration. For t e {1,..., 5n} let Yt 
be the random variable indicating the number of new balls ending up in bin u at round t. Notice 
that in the Tetris process Y \,... ,y 5n are i.i.d. B ((3/4)n, 1/n) hence E [Y\ + • • • + Ys n ] = 
(15/4)n and by applying Chernoff bound Q with S = 1/15 we get 

P(Ti + • • • + Y 5n ^ 4 n) ^ e~ an 


where a = 1/(180). 

Now let £ u be the event “Bin u will be non-empty for all the 5n rounds". Since when a bin is 
non-empty it looses a ball at every round, event £ u implies, in particular, that 

k — 5ra + Yi + • • • + Y 5n ^ 0 

That is Yi + • • • + Y§ n ^ 5n — k ^ 4n. Thus 

P(£ u ) < P(Yi + ■ ■ ■ + Y 5n ^ An) ^ e~ an 

The thesis follows from the union bound over all bins u £ [n]. □ 

We next focus on the maximum load that can be observed in the Tetris process at any given 
bin within a finite interval of time. We note that this result could be proved using tools from 
drift analysis (e.g., see [22]). We provide here an elementary and direct proof, that explicitely 
relies on the Markovian structure of the Tetris process. 

Let {Xt\t be a sequence of i.i.d. B ((3/4)n, 1/n) random variables and let Zt be the Markov 
chain with state space {0,1, 2 ,...} defined as follows 

f 0 if Z t -1 = 0 

Zt = \ 4 

{ Zt-1 -1 + X t if Z t -1 ^ 1 

Observe that 0 is an absorbing state for Zt and let r be the absorption time r = inf{t e N : 
Zt = 0}. We first prove the following lemma. 

Lemma 6. For any initial starting state k £ N and any t ^ 8k, it holds that 

P k(r > t) < g-i/ 144 
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Proof. Observe that 


P k(r >t) = P k{Z t > 0) 


k + J] X i 


-t> 0 


i= 1 




where in the last inequality we used hypothesis k < (l/8)t. Since the XjS are i.i.d. binomial 
-B((3/4)n, 1/n), it follows that is binomial -B((3/4)nf, 1/n) and from Chernoff bound 

we have that 







< 


q/e ) 2 3 -a 
e s 4 C 


= e"^ 144 


□ 


Now we can easily prove the following statement on the Tetris process. 

Lemma 7. Let c be an arbitrarily-large constant, and let the Tetris process start from any 
legitimate configuration. The maximum load is O(logn) for all t = 0(n c ), w.h.p. 

Proof. Consider an arbitrary bin u that is non-empty in the initial legitimate configuration. 
Let Q( 0) = O(logn) be its initial loac0 and let r = inf jf : Q® = oj be the first round the bin 

becomes empty. Observe that, for any t ^ t, Q® behaves exactly as the Markov chain defined 
in dU). Hence, from Lemma O it follows that for every constant c such that clogn ^ 8 Q^ we 
have 

p q(o) ( r > clogn) n ~ c/144 (5) 

Thus, within 0(log n) rounds the bin will be empty w.h.p., and since the load of the bin decreases 
of at most one unit per round, the load of the bin is O(logn) for all such rounds w.h.p. 

Next, define a phase as any sequence of rounds that starts when the bin becomes non¬ 
empty and ends when it becomes empty again. Notice that, by using a standard balls-into-bins 
argument, in the first round of each phase the load of the bin will be 0 (logn/loglogn) w.h.p. 
Moreover, in any phase the load of the bin can be coupled with the Markov chain in 0. Hence, 
for any arbitrary large constant c we can choose the constant c in (|5|) large enough so that, by 
taking the union bound over all phases up to round n c , the load of the bin is O(logn) in all 
rounds t ^ n c w.h.p. 

Finally, observe that for any bin that is initially empty the same argument applies with the 
only difference that the first phase for the bin does not start at round 0 but at the first round 
the bin becomes non-empty. The thesis thus follows from a union bound over all the bins. □ 

2.4 Back to the original process: Proof of Theorem [lj 

From a standard balls-into-bins argument (see, e.g., [ 33 ]), starting from any legitimate configu¬ 
ration, after one round the process still lies in a legitimate configuration w.h.p. and, thanks to 
Lemma El there are at least n/4 empty bins w.h.p. From Lemma 0] with T = O (n c ), we have 
that the maximum load of the repeated balls-into-bins process does not exceed the maximum 
load of the Tetris process in all rounds 1,...,T, w.h.p. Finally, the upper bound on the 
maximum load of the Tetris process in Lemma [7] completes the proof of the first statement of 
Theorem [I] 

As for self-stabilization, given an arbitrary initial configuration, LemmaE implies that within 
0(n) rounds, all bins have been emptied at least once, w.h.p. When a bin becomes empty, 
Lemma El ensures that its load will be O(logn) over a polynomial number of rounds. Hence, 
within 0(n) rounds, the system will reach a legitimate configuration, w.h.p. □ 

3 We omit the subscript u in the remainder of this proof since clear from context. 


10 




3 Parallel Resource Assignment 


As mentioned in the introduction, the repeated balls-into-bins process can also be seen as 
running parallel random walks of n distinct tokens (i.e. balls), each of them starting from a 
node (i.e. bins) of the complete graph of size n. This is a randomized protocol for the parallel 
allocation problem where tokens represent different resources/tasks that must be assigned to 
all nodes in mutual exclusion m- In this scenario, a critical complexity measure is the (global) 
cover time, i.e., the time required by any token to visit all nodes. 

It is important to observe that our analysis of the maximum load works for anonymous tokens 
and nodes and, hence, for any particular queuing strategy. Under FIFO strategy, no token 
spends in a bin a number of rounds exceeding the current load as it entered the bin. Theorem |T] 
then implies that, after an initial stabilizing phase of 0(n) rounds, every token will spend at 
most a logarithmic number of rounds in any bin queue it traverses and over any period of 
polynomial length, w.h.p. We also know that the cover time of the single random-walk process 
is w.h.p. 0(n\ogn) (see, e.g., [53j). Combining the above two facts, we easily get the following, 
almost tight result on the Parallel Resource Assignment problem. 

Corollary 8. The random-walk protocol for the Parallel Resource Assignment problem on the 
clique has cover time O (n log 2 ?r), w.h.p. 

Adversarial model. 

The self-stabilization property shown in Theorem |T] makes the random walk protocol robust 
to transient faults. We can consider an adversarial model in which, in some faulty rounds, an 
adversary can reassign the tokens to the nodes in an arbitrary way. Then, the linear convergence 
time shown in Theorem [T] implies that the O (n log 2 nj bound on the cover time still holds 
provided the faulty rounds happen with a frequency not higher than yn, for any constant 
7 ^ 6 . Indeed, thanks to Lemma El the action of an adversary manipulating the system 
configuration once every yn rounds can affect only the successive 5 n rounds, while our analysis 
in the non-adversarial model does hold for the remaining (y — 5)n rounds. It follows that the 
overall slowdown on the cover time produced by such an adversary is at most a constant factor 
on the previous O (nlog 2 n) upper bound, w.h.p. 

4 Conclusions and Open Questions 

In this paper, we showed that repeated balls-into-bin is self-stabilizing when the number m 
of balls equals the number n of bins (obviously, this is still the case, whenever m < n). An 
interesting open question is whether this result extends to larger values of m, i.e., for any 
m = O(nlogn). We believe an approach based on a lower bound on the number of empty bins 
might still work. Simulation results for increasing values of n (up to n ~ 10 5 ) show that the 
number of empty bins is still compatible with a linear function, even if standard deviation in 
our experiments turned out to be relatively large. 

A more general interesting question is the study of this process over more general graph 
classes. This line of research is also motivated by several recent applications of parallel random 
walks in the (uniform) gossip model [Hl ll4l[2Ull21| . As mentioned in the introduction, previ¬ 
ous analysis of this process provides a bound 0{y/T) on the maximum load after t rounds on 
regular graphs |7j. We believe this previous bound for regular graphs is far from tight and 
it leads to rough bounds on parallel cover times on these networks. We conjecture that the 
maximum load remains logarithmic for a long period in any regular graph. A possible reason 
for this phenomenon (if true) might be that the expected difference between (token) arrivals 
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and departures is always non-positive at every node in regular graphs. As highlighted in our 
analysis of the complete graph, this fact alone is not enough but it could be combined with 
a suitable bound on the number of empty bins, in order to prove our conjecture in this more 
general case. Unfortunately, non-complete graphs present a further technical issue: in order to 
apply any argument based on the presence of empty bins, not only do we need to argue about 
their number, but also about their distribution across the network. This technical issue seems 
to be far from trivial even on simple topologies such as rings. 

Finally, a technical question concerns the tightness of our bound on the maximum load. In 
the classical (one shot) balls-into-bins problem, it is well-known that the maximum load of the 
bins is 0 (log n/ log log n) w.h.p. One may wonder whether our O (log n) upper bound on the 
maximum load of the repeated process for a polynomial number of rounds is tight, or it can be 
improved to 0(logn/loglogn). We conjecture that, within any polynomial time window, the 
probability that the maximum load asymptotically exceeds log n/ log log n is non-negligible. 
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Appendix 

A Useful inequalities 

Lemma 9 (Chernoff bound). Let {X t : t e [?z]} be a family of independent binary random 
variables. Let X = ]P” =1 X t and let ^ E [A] ^ p,u■ For every 5 £ (0,1) it holds that 

P(X < (1 - S)n L ) < exp (6) 

TH^j (7) 


P(X ^ (1 + 5) ijlh) < exp — 
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B Negative association 


Definition 10 (Negative association). Random variables X\,... ,X n are negatively associated 
if, for every pair of disjoint subsets I, J c [n], it holds that 

E [f (Xi, i e I) ■ g (X } , j eJ)]<E [/ {X h ieI)]-E[g (Xj, j e J )] 

for all pairs of functions f : —> M and g : —*• M that are both non-decreasing or both 

non-increasing. 

Now we give a simple counterexample showing that, in our balls-into-bins process, the 
random variables counting the number of balls arriving in a given bin in different rounds cannot 
be negatively associated. 

Consider our random process with n = 2 and let X\ and X -2 be the random variables 
indicating the number of tokens arriving at the first bin in rounds 1 and 2, respectively. Let 
f = g be the non-increasing function 


f(x) 


1 if x = 0 

0 if x > 0 


If Xi and X 2 were negatively associated, we thus would have that P(Xi = 0, X 2 = 0) < 
P(Xi = 0)P(X 2 = 0). However, by direct calculation it is easy to compute that 

P(Xi = 0,x 2 = 0) = 1/8 

because, in order for U X\ = 0 , X 2 = 0 ” to happen, at the first round both balls have to end up 
in the second bin (this happens with probability 1/4) and at the second round the ball chosen 
in the second bin has to stay there (this happens with probability 1/2). But we have that 
P(Ni = 0) = 1/4 and by conditioning on all the three possible configurations at round 1 we 
have P(X 2 = 0) = 3/8. Thus 

l = P(*1 = 0 , X 2 = 0) > P(Xi = 0)P(X 2 = 0) = i • jj 

In general, intuitively speaking it seems that event “Xt = 0” makes more likely the event that 
there are a lot of empty bins in the system, which in turn makes more likely event “Xt +1 = 0 ” 
that the bin will receive no tokens at round t + 1 as well. 
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